Goto

Collaborating Authors

 discountfactor 0


b6846b0186a035fcc76b1b1d26fd42fa-Supplemental.pdf

Neural Information Processing Systems

We compared RAPS with the latest state-of-the-art work that incorporates DMPs with Deep RL: Neural Dynamic Policies [6]. One question that may arise is: How useful isthe dummy primitive? We runanexperiment with and without thedummy primitiveinorder toevaluate itsimpact, and find that the dummy primitive improves performance significantly. Each image depicts the solution of one of the tasks, we omit the bottom burner task as it is the goal is the same as the top burner task, just with a different dial to turn. For the sequential multi-task version of the environment, in a single episode, the goal is to complete four different subtasks.


a8166da05c5a094f7dc03724b41886e5-Supplemental.pdf

Neural Information Processing Systems

For our specific algorithm, TD3+BC, given the performance gain over existing state-of-the-art methods is minimal, it would be surprising to see our paper result in significant impact in these contexts. ForCQLwemodify the GitHub defaults for the actor learning rate and use a fixedα rather than the Lagrange variant, matching thehyperparameters definedintheirpaper(whichdiffersfromtheGitHub), aswefound theoriginal hyperparameters performed better. We can also chooseλ by considering the value estimate of the agent-if we see divergence in the value function due to extrapolation error [Fujimoto et al., 2019], then we need to decreaseλ such that the BC term is weightedmorehighly. We use the default hyperparameters in the Fisher-BRC GitHub. Figure 1: Percent difference of performance of offline RL algorithms when adding normalization to state features.